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Abstract 

The Watson-Barker Listening Test (WBLT) is one of the most popular measures of listening 
comprehension. However, participants in studies utilizing this scale have been almost 
exclusively Anglo-American. At the same time, previous research questions the psychometric 
properties of the test. This study addressed both of these issues by testing the psychometric 
properties of the scale with Hispanic-American postsecondary students. Results suggest that the 
measure does not meet the proposed five-factor structure and that the items hold little 
relationship to one another. Thus we recommend researchers and educators choose alternative 
means of measuring listening comprehension. 

Almost from its inception as a distinct field, listening researchers focused on identifying 
components of “good” listening (Bostrom, 2011). One of the primary components believed to 
constitute good listening is comprehension. In response to this belief, numerous measures of 
listening comprehension have been proposed (e.g., Brown-Carlsen Listening Comprehension 
Test, Communication Competency Assessment Instrument, Kentucky Comprehensive Listening 
Test; Watson-Barker Listening Test) (Bostrom & Waldhart, 1983; Brown & Carlsen, 1955; 
Rubin, 1982a, 1982b, Watson & Barker, 1988). The Watson-Barker Listening Test (WBLT) has 
emerged as the most utilized measure of listening comprehension by listening scholars, 
consultants, and teachers. 

Given the close connection between ethnicity and communication (Gudykunst, 2002), and the 
rapidly changing ethnic composition of the United States, understanding the impact of cultural 
and ethnic differences on communication has become increasingly important. Nonetheless, our 
understanding of how ethnic differences may affect listening skills and attitudes is woefully 
underdeveloped. Studies utilizing the WBLT (as well as other listening measures) have relied 



almost exclusively on Anglo-American participants. As Little (1997) and Keaton and Bodie 
(2013) note, scales properties may change across different populations. Thus, it behooves 
listening scholars to continuously evaluate the validity and reliability of listening measures, 
including what populations and contexts are appropriate for them. 

In addition to the general lack of research with other populations, several studies have questioned 
the psychometric properties of the current version of the WBLT, especially given that significant 
changes have been made across the various versions since the test was developed (see, for 
example, Worthington, Fitch-Hauser, Cook, & Powers, 2009; Bodie, Worthington, & Fitch- 
Hauser, 2011). Psychometrically sound instruments are a necessity if listening scholars are 
accurately to describe and explain listening processes (Keaton & Bodie, 2013). 

We address both of the above issues in this study. First, our study provides an additional test of 
the psychometric properties of the WBLT. Second, because Hispanic-Americans are one of the 
fastest growing ethnic groups in the US, composing approximately 16.3% of the US population 
(Humes, Jones & Ramirez, 2011), we chose to use Hispanic-American postsecondary students as 
participants in our study. 


Watson-Barker Listening Test 

The Watson-Barker Listening Test was developed in 1982 as a means to measure adult listening 
behavior (Watson & Barker, 1984, 1988; Watson, Barker, Roberts & Roberts, 2001). Presented 
via video, the WBLT tests for five different listening abilities - interpretation of meaning, 
interpretation of emotion, understanding, recall, and the ability to follow instructions. In 
addition, each section is designed to test a listener’s ability in both short-term and long-tenn 
listening contexts (e.g., conversations, lectures, etc.). Watson et al. (2001) contend that the test 
focuses on the types of listening adults may face in professional settings. 

The scale has been used in a variety of contexts (e.g., education and business) and studies 
(Applegate & Campbell, 1985; Bommelje, Houston, & Smither, 2003; Clark, 1989; Fitch- 
Hauser, Powers, O’Brien, & Hanson, 2007; Roach & Fitch-Hauser, 1984; Vierthaler & Barker, 
1985; Villaume & Brown, 1999; Watson & Rhodes, 1988; see also, Watson et al., 2001, for a 
review). It is one of the most utilized classroom and workshop measurements, with students 
taking it before and after exposure to listening instruction. This type of usage often occurs in 
classes and workshops dedicated to improving listening competency. 

The developers of the WBLT, recognizing the complexity of the listening process, acknowledge 
that the test accounts for only a relatively small amount of variance. However, this complexity 
also likely explains why a number of studies have found the WBLT lacking in overall 
convergent and discriminate validity (cf. Applegate & Campbell, 1985; Bodie, Worthington, & 
Fitch-Hauser, 2011; Fitch-Hauser & Hughes, 1986, 1992; Roberts, 1985; Rubin & Roberts, 
1987; Villaume & Weaver, 1996). However, the measure continues to be used in research, 
classroom, and professional settings despite these problems. 
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The most serious questions of the psychometric properties of the scale were raised by Bodie et 
al. (2011). Reporting the results of a confirmatory test of Form C of the WBLT, they found that 
their test of those data did not match the model originally proposed by Watson and Barker. They 
also tested a second-order model and an unidimensional model. They concluded that the models 
tested were essentially no better than the independence model. Thus, with their student 
population, they found little association across the 40 items of the WBLT-C. The Bodie et al. 
study consisted of 208 participants: 181 Caucasian students, 23 African-American students, and 
the remaining participants self-identified as a variety of other ethnic groups (e.g., Asian, 
Hispanic, multiethnic, etc.). 

However, as noted earlier, researchers suggest that scale properties may change with different 
populations (Keaton & Bodie, 2013; Little, 1997). Therefore, the goal of this study is to further 
test the psychometric properties of the WBLT-C with Hispanic-American postsecondary student 
participants. 

Method 


Participants 

Participants attended a Southwestern US university. Only participants who self-identified as 
Hispanic were included in this analysis (n = 214). Hispanic participants self-identified by a 
question that asked the person's origin or descent. More specifically, Hispanic respondents, no 
matter their race, “were defined as persons of Hispanic origin, in particular, those who indicated 
that their origin was Mexican, Puerto Rican, Cuban, Central or South American, or some other 
Hispanic origin,” reflecting the definition provided by the US Census Bureau (“Hispanic 
Population,” 2011). 

Of the 214 participants, 147 were male (64%) and 82% were full time students. Participants 
ranged in age from 19 to 43 years, with an average age of 22.11 (SD = 3.6); 62% were first-year 
students and 30% sophomores. Approximately 67% of participants indicated they were bilingual 
(English/Spanish). 

Procedures 

Data was collected as part of larger study investigating listening comprehension and additional 
listening and communication variables in a single hour-long session. At this session, the 
participants first reviewed an informed consent statement. Next, they viewed a video recording 
of the Watson-Barker Listening Test (Fonn C). 1 After viewing the video, they completed the 
WBLT scoring sheet as well as a short survey consisting of additional attitudinal, listening, and 
demographic items. 

Instruments 

The Watson Barker Listening Test (Form C) (Watson et al., 2001) consists of 40 items and is 
designed to measure five aspects of listening comprehension: Interpreting message content, 
understanding meaning in conversations, remembering lecture infonnation, interpreting 
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emotional meaning, and following directions and instructions. The test is administered in 
English via a video recording. Following the presentation of the stimulus materials, participants 
complete a 40-item questionnaire (eight questions for each of the five areas). Participants are 
instructed to mark the correct answer on a written scoring sheet. Participant answers are scored 
as either correct or incorrect. An overall score is also computed. Table 1 reports general 
descriptive statistics as well as the number of correct and incorrect items for each subscale and 
for the comprehensive score. 

Internal consistency was estimated using Cronbach’s a for the original five-factor structure (Part 
I, Evaluating Interpreting Message Content, a = .39; Part II, Understanding Meaning in 
Conversation, a = .29; Part III, Understanding and Remembering Lecture Information, a = .44; 
Part IV, Evaluating Interpreting Emotional Meaning in Messages, a = .44; Part V, Following 
Instructions and Directions, a = .39) and a unidimensional structure (a = .70) as previously tested 
by Bodie, et ah, 2011). Additional analysis revealed that the items within the original proposed 
factors are not highly correlated (see Table 2). Therefore, results concerning the five-factor 
structure should be interpreted conservatively. 

Results 


Preliminary Analyses 

Prior to running the primary analyses, data were inspected for adherence to statistical 
assumptions (Tabachnick & Fidell, 2007). With N= 214 and alpha set to .05, statistical power 
was .43 to detect small correlational effects (r = .10) and exceeded .99 for medium (r = .30) and 
large (r = .50) effects. Furthermore, the data set was sufficiently powered to assess model fit and 
parameter estimates (based upon recommendations from Hu, Bentler, & Kano, 1992). 

Confirmatory factor analytic procedures (using maximum likelihood estimation) were employed 
to estimate the WBLT-Fonn C’s ability to represent these data for both its proposed five-factor 
structure and a unidimensional structure (as outlined in Bodie, et ah, 2011). Commonly used fit 
indexes and comparison thresholds were utilized: The comparative fit index (CFI) above .90, the 
standardized root mean square residual (SRMR) below .10, and the root mean square error of 
approximation (RMSEA) below .08 (Byrne, 2010; Kline, 2005). 

Tests of Model Dimensionality 

Five-factor structure. Inspection of fit statistics for the five-factor structure across participants 
indicated poor representation of these data, y 2 (734) = 870.94, p < .001, CFI = .68, SRMR = .07, 
RMSEA =.03. 

Unidimensional structure. Inspection of fit statistics for the unidimensional structure across all 
participants indicated poor representation of these data,(740) = 915.75,/? < .001, CFI = .59, 
SRMR = .07, RMSEA = .03. 
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The results of the tests of model dimensionality for the WBLT precluded further analysis of 
listening comprehension. 


Discussion 

Prior studies utilizing the WBLT (Form C or D) as a measure of listening comprehension 
primarily used Anglo-American participants. As previously noted, scale properties may change 
with different populations (Little, 1997; Keaton & Bodie, 2013). Thus, this study had two goals: 
To test the psychometric properties of the WBLT- Form C, and to do so with a Hispanic- 
American student population. 

As seen above, results of the tests of model dimensionality provide further empirical evidence 
that the WBLT-C should not be used as an assessment instrument for listening comprehension. 
While internal consistency estimates for our Hispanic-American participants improved over 
those reported by Bodie et al. (2011), the five factor structure originally proposed by Watson and 
Barker was not supported with these participants. Confirming the findings of Bodie et al. (2011), 
the 40 items of the WBLT-C are, at best, a loosely associated group of measurement items. 
DeVellis (2003) argues that in scale construction, items should be at minimum moderately 
correlated with one another. Such is not the case with this measure. 

Educators and trainers often use listening comprehension tests such as the WBLT-C as a means 
of pretesting and post-testing student listening in classes and in communication training 
workshops. Despite our findings, some instructors may still wish to utilize the WBLT-C as a 
means of stimulating classroom discussion. However, it is very likely that students will see their 
scores as an objective measure of their listening skills. Unfortunately, their scores may give 
them the false impression that their listening is better or worse than it is in actuality, even when 
educators stress to them that the test is only being used to illustrate potential problems in 
common listening contexts. Given this, it is our strong suggestion that educators avoid using the 
Watson-Barker Listening Test. Unfortunately, we cannot suggest a good alternative. 

Bodie et al. (2011) offer several considerations for developing future listening measures. For 
example, they suggest the use of dichotomous measures (i.e., correct/incorrect) is problematic. 
Meaning is often derived from the context and individuals who are interacting. Thus, the “one- 
size-fits-all” approach taken by the WBLT may not accurately reflect the interactive nature of a 
listening context, particularly when deriving meaning from a message. 

This argument may be particularly true for individuals who are bilingual. A rich literature 
focuses on the effects of being bilingual (see Marian & Shook, 2012 for an overview). For 
example, previous research indicates that bilingual persons do not use one language at a time. 
Both languages are active simultaneously. When individuals listen, word activation cues up 
corresponding words regardless of the language to which the word may belong (Marian & 
Spivey, 2003). As a result, bilingual listeners have the potential to map words into either 
language. The cognitive load that results from linguistic competition such as this is known to 
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result in some language difficulties (Marian & Shook, 2012). For example, speakers of two or 
more languages may name pictures more slowly (Gollan, Montoya, Fennema-Notestine, & 
Morris, 2005). They are also more likely to experience moments where they have difficulty 
recalling a term, but may be able to remember attributes associated with it (Gollan & Acenas, 
2004). 

When responding to questions of the WBLT-C, participants use information beyond that in the 
verbal message. Two subscales of the WBLT are designed to measure meaning-understanding 
conversational meaning and understanding emotional meaning. However, meaning cannot be 
separated from the larger context of an interaction, so it may not be viable to attempt to measure 
it as a separate component/subscale as done by the WBLT. As Wagner’s (2008) research on 
listening comprehension suggests, second language speakers vary in how they use and process 
nonverbal elements of spoken text. Consequently, second-language learners may have greater 
difficulties decoding nonverbal communication. These findings provide further support for 
claims that attempts to measure listening comprehension should revisit the question of what 
constitutes the basic elements of comprehension (see for example, Bodie, Worthington, hnhof, 
and Cooper, 2008; Bostrom, 2011). 

Listening scholars only recently began testing the psychometric properties of many early, 
established listening measures, such as the WBLT-C. Not only is it important for scholars to test 
the psychometric properties of listening measures to ensure the soundness of the research they 
conduct: it is also important to test their viability with other ethnic and cultural groups. 

Because the WBLT-C has, so far, been shown to be psychometrically problematic, we were 
unable to fully realize the second goal of our study. However, some listening researchers have 
begun addressing the role of culture (primarily defined by national origin) on differences in 
listening conceptualizations and behaviors. For example, finhof & Janusik (2006) developed the 
Listening Concepts Inventory (LC1) as a means of identifying cognitive constructs that drive 
listening behavior. Their factor analysis identified four major dimensions associated with 
participants’ subjective perceptions of listening: listening as organizing information, listening as 
relationship building, listening as learning and integrating information, and critical listening. 
Their follow-up study of these dimensions suggests that individual conceptualizations of the 
listening process varies. For example, they found that US participants conceptualized listening 
as a sustained activity, while German participants viewed listening more as an interactive 
situation that focuses on the individual and requires greater monitoring of the conversation. 

Imhof and Janusik note that individual concepts of listening can be described as a composition of 
multiple and independent elements that fonn a belief system. They go on to conclude that these 
differing belief systems are likely the source of the differences in how German and US young 
adults conceptualize listening. 

More recently, Zohoori (2013) compared US and Iranian students’ perceptions of personal 
listening competence using the Brownell HURIER Listening Profile. At its most basic level, 
listening competency addresses an individual’s proficiency in literal comprehension (e.g., 
identification of main ideas, support material, etc.) and critical comprehension (e.g., recognition 
of personal biases, intended meanings, etc.) (“Speaking & Listening,” 1998). While both groups 
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perceive their personal listening competence quite similarly, US students rated themselves as 
somewhat better listeners than did their Iranian counterparts in the areas of hearing, 
remembering, and responding. 

Importantly, however, none of these studies addressed “cultures within cultures.” That is, they 
assume that these nations are culturally homogenous. A review of listening literature found only 
one study addressing differences between groups within a nation. Dillon and McKenzie (1998) 
examined four US groups: African-, Anglo-, Asian-, and Hispanic-American students. Their 
study explored the influence of ethnicity on listening as well as communication competence, 
approach, and avoidance. They found that “approaching” behaviors, but not avoidance 
behaviors, appear to differ by ethnicity. In general, significant differences were identified across 
the four groups. For example, Anglo-American students averaged higher scores on willingness 
to communicate than did African-Americans, and this finding held true for willingness to 
communicate with either friends or strangers. In contrast, Hispanic-American students reported a 
greater willingness to communicate with strangers than did their Asian-American counterparts. 
Unfortunately, one of the weaknesses of the study, its relatively few minority participants, was a 
factor acknowledged by the authors, and was one which led them to inflate their significance test 
probability level to p < .10. While Dillion and McKenzie identify important communicative 
differences, they focus more on the impact of these differences on individual interactions and 
less on the possible origins of these differences. 

These studies suggest that Hispanic listeners may have unique belief systems that inform their 
conceptualization of listening, and subsequently affect their listening behaviors. 

Conclusion 

To conclude, results of this and the previous Bodie et al. (2011) study have supported the notion 
that the WBLT is not psychometrically sound and strongly suggest that the scale should not be 
used as a measure of listening comprehension. The reality is that cultural differences impact 
how we listen (Beall, 2010). However, despite evidence to the contrary, many listening scholars 
continue to treat nations as if they are composed of a single, homogeneous group. While 
Hispanic-Americans as a group are diverse (Sonderup, 2004), they do share a number of cultural 
commonalities that may inform their listening belief system: A dominant Roman Catholic 
tradition, a strong family structure, and a significant community commitment (Jandt, 2013). 
Hispanic cultures tend to be collectivistic and thus emphasize group activities and shared 
responsibility (Gudykunst, 1998). Hispanic social norms generally stress good manners, 
cooperation, courtesy, harmony, and positive interactions, while discouraging offensive 
behaviors and direct criticisms of others (Guarnero, 2005; Gudykunst, 1998; Klopf & 
McCroskey, 2007; Salimbene, 2000; Smith, 2000). We encourage researchers to focus greater 
attention on the effect of ethnic and cultural factors on listening behavior with Hispanic- 
Americans and other ethnic groups and to address the limitations of this study (e.g., differences 
in English speaking ability, impact of country of origin, individual level of acculturation). 
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Table 1 

Descriptive Statistics for the WBLT subscales and WBLT Total Score (n = 214) 


Subscale 

Mode 

Mean (SD) 

Range 

Minimum 

Correct 

Maximum 

Correct 

Evaluating 

Message Content 

4 

4.26(1.59) 

6 

1 

7 

Understanding 
Meaning in 
Conversations 

5 

5.33 (1.38) 

7 

1 

8 

Understanding/ 

Remembering 

Lectures 

4 

3.64(1.77) 

8 

0 

8 

Evaluating 

Emotional 

Meaning 

4 

3.95 (1.63) 

7 

0 

7 

Following 
Instructions & 
Directions 

5 

4.61 (1.62) 

8 

0 

8 

Total Score 

22 

21.8 (4.99) 

25 

8 

33 
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Table 2 


Average Inter-Item Correlations for the Watson-Barker Listening Test—Form C 
Factors 


Factors 

Average r 

a 

Evaluating message content 

.07 

.39 

Understanding meaning in conversations 

.04 

.29 

Understanding and remembering lectures 

.09 

.44 

Evaluating emotional meaning in messages 

.06 

.44 

Following instructions and directions 

.07 

.39 


Endnotes 

1 Since data collection for this study was conducted, a newly revised version of the Watson- 
Barker Listening Test (Forms E & F) has been released by Innolect. Clothing and 
technological references have been updated. However, the delivery and testing format, and 
many of the questions are virtually the same. 
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